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This paper studies an approximation method for the log-hkelihood 
function of a nonlinear diffusion process using the bridge of the diffu- 
sion. The main result (Theorem 1) shows that this approximation 
converges uniformly to the unknown likelihood function and can 
therefore be used efficiently with any algorithm for sampling from 
the law of the bridge. We also introduce an expected maximum like- 
lihood (EML) algorithm for inferring the parameters of discretely ob- 
served diffusion processes. The approach is applicable to a subclass 
of nonlinear SDEs with constant volatility and drift that is linear in 
the model parameters. In this setting, globally optimal parameters 
are obtained in a single step by solving a linear system. Simulation 
studies to test the EML algorithm show that it performs well when 
compared with algorithms based on the exact maximum likelihood 
as well as closed-form likelihood expansions. 

1. Introduction. In the natural and social sciences, diffusion processes 
are widely used as models for random phenomena that evolve continuously 
in time. They are popular because they arise as solutions of stochastic dif- 
ferential equations, which are natural probabilistic generalizations of the 
deterministic models described by ordinary differential equations. It is well 
known that if the data are recorded at discrete times, parametric inference 
for diffusions using maximum likelihood estimates is difficult, primarily be- 
cause it is usually impossible to find the corresponding likelihood function 
[see S0rensen (2004) for a review of methods of inference in the diffusion set- 
ting]. In this paper, we are concerned with the estimation of the parameters 
in the stochastic differential equation (SDE) 



(1.1) dXt = n{Xt,e)dt + dWt where /z(-,-):] 



Received February 2008; revised April 2009. 
AMS 2000 subject classifications. 62F12, 60J60. 

Key words and phrases. Nonlinear diffusion, maximum likelihood, EM algorithm, esti- 
mation, global optimization. 

This is an electronic reprint of the original article published by the 

Institute of Mathematical Statistics in The Annals of Statistics, 

2010, Vol. 38, No. 1, 215-245. This reprint differs from the original in pagination 

and typographic detail. 



1 



2 



A. MIJATOVIC AND P. SCHNEIDER 



is an arbitrary continuous (possibly) nonlinear function and Wt is a standard 
Brownian motion. In order to guarantee the existence of a nonexplosive solu- 
tion of (1.1), we need to assume that for each parameter value 9 G M^"*"^, the 
function n{-,9) :M— )-M is locally Lipschitz with linear growth [see Kloeden 
and Platen (1999), Chapter 4]. The task is to infer the vector of coefficients 
G G C in the drift fJ-{-,0) from K +1 observed realizations xq, . . . , xk 

of the diffusion Xt, where Q is some compact subset in the parameter space. 

When the exact likelihood function is available, the parameters can be 
determined by maximizing the joint likelihood of the observations. The true 
likelihood function is, however, available only in very few cases. A variety 
of approximations exist and are well documented in the literature [see Ai't- 
Sahalia (2002) and Jensen and Poulsen (2002), Hurn, Jeisman and Lindsay 
(2007), Schneider (2006) for empirical comparisons of the available methods]. 
A general method for parameter inference based on the EM algorithm is to 
maximize an approximate likelihood function, which can be defined if one 
can simulate the bridge of the diffusion in (1.1) (see Section 2 for the precise 
definition of this approximation). Recently, an exact simulation approach 
for diffusion bridges was developed in Beskos et al. (2006) and an efficient 
algorithm for sampling from bridges of ergodic diffusions was proposed in 
Bladt and S0rensen (2007). Either of these simulation methods can be used 
to define the approximate likelihood function mentioned above. The main 
theoretical contribution of the present paper is Theorem 1, which states that, 
under some additional regularity conditions on the drift n{-,0) in (1.1), the 
approximation for the likelihood function obtained by the simulation of the 
bridge of the diffusion in (1.1) is justified because it approximates uniformly 
the true likelihood. For the precise statement of the result, see Section 2. 

In this paper, we also propose a new algorithm for the inference of pa- 
rameters when diffusion (1.1) takes a simpler form, as given in (3.1). Our 
method circumvents the use of numerical optimization to determine the pa- 
rameters for diffusion models of the form (3.1). The estimation algorithm 
transforms the original problem into a related inference problem that has a 
unique global solution 9* G M^"^^which is obtained in a single step by solving 
a linear system of dimension (N + 1) x (N + 1) given in (3.5). By Theorem 1, 
the related inference problem approximates uniformly on compact subsets 
of the parameter space the original inference problem. We also show that 
the approximations of the expectations that feature in linear system (3.5) 
converge uniformly on bounded subsets of the parameter space as the time 
interval between consecutive data points goes to zero (i.e., the number of 
observations K + 1 goes to infinity). This property is implied by Theorem 
3. 

Diffusions that are not of the form (1.1) can often be transformed into 
the required structure by a well-known change-of-variable method [see (3.6) 
at the end of Section 3]. The constant diffusion coefficient requirement in 
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(1.1) is therefore not as restrictive as it may seem at first glance. Many of 
the widely used diffusion processes with state-dependent volatility fall into 
this class. The square root process and the flexible diffusion used in Ai't- 
Sahaha (1996) and Jones (2003b) [see (4.3) and (4.4) for the SDEs describing 
the models] can be dealt with in this fashion. The likelihood functions of 
both processes, conditional upon S&P 100 implied volatility index data, are 
analyzed using the EML algorithm in Section 4.2. In the case of the square 
root process, direct maximum likelihood estimation is also performed and 
it is shown that the parameter values obtained agree with the ones found 
using an algorithm based on the EML procedure. In this paper, we consider 
only one-dimensional diffusion processes, even though the EML algorithm 
can be applied to the multidimensional case without introducing additional 
computational complexity when the underlying process is reducible [see Ai't- 
Sahalia (2008) for the precise definition]. However, extending the result of 
Theorem 1 to higher dimensions is a much harder problem. 

The paper is organized as follows. Section 2 describes how to approximate 
the likelihood function and states our main theoretical result (Theorem 1). 
Section 3 defines and derives the EML algorithm for diffusions given by (3.1). 
Section 4 consists of two subsections: in Section 4.1, a comparison of the EML 
algorithm with exact maximum likelihood estimation and the analytic like- 
lihood approximation method from Ai't-Sahalia (2002) is performed; Section 
4.2 estimates the square root and flexible diffusion processes conditional on 
implied volatility data. Section 5 concludes the paper. The Appendices A 
and B contain the proofs of Theorems 1 and 3. 

2. The main result. Let xq, . . . , xk denote K + 1 realizations of the dif- 
fusion Xt given in (1.1), observed at times to,...,tK- To avoid notational 
complexity, we assume evenly spaced time intervals between consecutive 
data points: A = t^. — tyfc„i for all A; E {1, ... , K}. Since we are assuming that 
the drift /i(-,^) :M — )■ M is locally Lipschitz and has linear growth, SDE (1.1) 
has a weakly unique solution for any starting point xq in its domain and 
any parameter value S C M^+^. 

The starting point of our approach is the EM algorithm, which we now 
briefly review [see Dempster, Laird and Rubin (1977) or McLachlan and 
Krishnan (1997) for the general theory]. We begin by considering two con- 
secutive data points and then apply our analysis to the entire data set. 
Between two consecutive observations x^-i and x^ at times tk-i and tk, 
respectively, we introduce M — 1 evenly spaced auxiliary latent state vari- 
ables ui, . . . ,um-i and define no := x^-i, um '■=Xk- Note that the length 
of the time interval between Um-i and Um, for any m E {1, . . . ,M}, equals 
5 := A/M. Given two observed realizations, uq and um, the task is to find 
the parameter 6 = (oq, . . . , oat) such that the value '^{um \ uo,6) of the con- 
ditional transition density of the diffusion Xt is maximized. Consider the 
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joint likelihood tt{um, ■ ■ ■ ,ui \ uq, 9) of the variable um and the latent aux- 
iliary variables ui, . . . ,um-i conditional on uq. The Markov property of the 
diffusion Xt implies the following representation: 

M 

(2.1) 7r('UM,---,'"i I ^io, m "Um—li " ) ■ 

m=l 

In order to formulate the EM algorithm in our setting, we need to intro- 
duce the following notation. Let random variables Um ■= ^t^.^+m^ for all 
m £ {1, . . . , M — 1} correspond to the auxiliary states between the consec- 
utive observations and let the random vector U := (J7i, . . . , be the 
auxiliary state vector. The joint distribution of U is given by the law of the 
bridge of the diffusion Xt, which starts at time tk-i at the level uq and 
finishes at the level um at time tk, denoted by Qg (or, more accurately, 
qA,mo,"j\/^_ The subscript 9 signifies the dependence of this probability law 
on the parameters in the model. The EM algorithm starts with some feasible 
value ^0 of the parameter 9 and repeats the following two steps: 

E-step: determine the conditional expectation 9 i— >• ^Qg^ [log7r(?7, um \ uo,9)]; 
M.-step: maximize this expression with respect to 9. 

The function in the E-step of the algorithm is known as the complete likeli- 
hood. The important observation is that the expectation defining the com- 
plete likelihood is taken with respect to the distribution of the vector U, 
which is given by the law Qg^ of the diffusion bridge. With each iteration, 
the value n{uM \ uo,9) is increased and therefore the algorithm is guaran- 
teed to converge to a stationary point which, in some pathological cases, is 
not a local maximum [see McLachlan and Krishnan (1997) for convergence 
properties of the EM algorithm]. It is thus key to understanding the be- 
havior of the complete likelihood 9 i— )• IEq^^ [log7r(i7, um \ uo,9)] for any fixed 
parameter value 9n- 

There are two problems associated with the E-step of the algorithm in 
our setting. The first is that the joint density 'it{um, ■ ■ ■ ,ui \ uo,9) for the 
law of the process Xt given by SDE (1.1) cannot be obtained in closed form. 
The second problem is that the law of the bridge of the diffusion Xt (i.e., 
the process Xt conditional upon Xtf,_-^ = uq and Xt,, = um) which arises in 
the expectation is also unknown. 

Using an Euler scheme approximation for the solution of SDE (1.1), 
together with Markov property (2.1), one can obtain an approximation 
for the joint likelihood function Tr{uM , ■ ■ ■ ,ui \ uo,9) in the following way. 
Over any short time period of length 5, the Euler scheme approximates 
the solution Xt-\-s of SDE (1.1) at time t + 5, conditional upon the level 
Xt, by the normal random variable Xt + fi{Xt,9)5 + Ws with mean Xt + 
fi{Xt,9)6 and variance 6. Over a longer time period A, a succession of 
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such normal variables is used to approximate the original process [see Kloe- 
den and Platen (1999), Section 10.2, for the general theory and conver- 
gence properties of Euler schemes for SDEs]. Each transition density T^{um \ 
Um-i,d), m £ {1,...,M}, in (2.1) can therefore be approximated by the 
normal density (j){um',Um-i + tJ'{um-i,6)6,6) with mean Um-i + n{um-i,0)S 
and variance ^ = defined above. Identity (2.1) implies that an approxi- 
mation for the joint likelihood Tr{uM, ■ ■ ■ ,ui \uo,6) is given by the product 
nm=i 4'ium]Um-i + IJ-ium-1, 0)6, 6) . TMs approximation is useful because it 
depends explicitly [through the known drift function fi{-,9)] on the model 
parameter 6 and has been used in Pedersen (1995) and Brandt and Santa- 
Clara (2002) to obtain an approximation for the transition density. The 
method, known as simulated maximum likelihood (SML), is based on the 
following convergence result which holds under global Lipschitz and linear 
growth conditions [see Theorem 2 in Pedersen (1995)]: 

7r{uM \uo,9) 



The SML algorithm uses this relationship to obtain an approximation for 
the likelihood function directly. As we will now show, it is possible to circum- 
vent the difficult issue of the computation of high-dimensional integrals and 
obtain optimal parameter values without having to compute approximations 
of transition densities. 

In the same spirit as in Jones (1998), Eraker (2001), Elerian, Chib and 
Shephard (2001) and Roberts and Stramer (2001), the auxiliary data points 
introduced at the beginning of this section are there to exploit the con- 
vergence of the discrete-time approximation to the diffusion Xf. As shown 
above, the main problem is to find the maximum of the complete likeli- 
hood 6 ^KQg^[log7r{U, um I uq,9)] for any value in the parameter space. 
Instead of doing this, we solve an "approximate" problem, where the func- 
tion Tr{uM, ■ ■ ■ ,ui I uo,6) under the expectation is replaced by the density 
nm=i 4'{um',Um-i + fJ-{um~i,())6, 6) of the Eulcr scheme approximation. The 
approximate likelihood function can then be obtained as soon as we can 
simulate the trajectories of the diffusion bridge [using, e.g., the algorithm 
in Beskos et al. (2006) or Bladt and S0rensen (2007)] governed by the law 
Qbq- In Section 3, we will show that the approximate problem has a unique 
maximum that can be obtained as a solution of a linear system of size 
{N + 1) X (A^ + 1)) where A^ + 1 is the dimension of the parameter 9, if the 
underlying diffusion takes the form (3.1). 

A natural question that arises at this point concerns the quality of the 
approximation of the complete likelihood by the sequence of functions {9 i— )■ 



(2.2) 




■m=l 
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^Qbo Em=i log 4>{Um; Um-1 + ^ fJ.{Um-~i, 0) , ^)])MeN- Numerical experiments 
in Section 4 suggest that this approximation works weh. Under some addi- 
tional regularity hypothesis on the drift fi{-,6), this intuitive claim can be 
justified by the following theorem. 

Theorem 1. Suppose that, in addition to the local Lipschitz condition 
with linear growth, we also assume that the function ;u:M x M^''"^ — )■ M m 
(1-1) is twice differentiahle in the state variable with hounded second deriva- 
tive. Let 6q be a fixed value in the parameter space. The following equality 
then holds: 



for all in the parameter space, where (j){y;x,5) is the normal density func- 
tion with mean x and variance 6. Furthermore, the limit is uniform in 9 on 
each compact subset of the parameter space. 

Note that the nature of Theorem 1 is fundamentally different from that of 
the result (2.2) above, proved in Pedersen (1995), because the expectation 
in the theorem is taken with respect to the law of the bridge of the diffusion 
Xt, rather than the law of the diffusion Xt itself (i.e., conditional upon 
Xti^ = um = xt^ and Xtf^_^ = uq = xt^_-^). The proof of Theorem 1 is found 
in Appendix A. It is based on the fact that in the one-dimensional case, 
there exists an explicit formula for the transition density of the diffusion in 
terms of the Brownian bridge [see Rogers (1985)]. Because this is a special 
property of the one-dimensional case, the proof does not easily generalize to 
the multidimensional setting. 

The main contribution of Theorem 1 is that it provides the theoretical 
basis for using the approximate complete likelihood described above with 
any method capable of simulating the diffusion bridge of the process defined 
by (1.1), including the algorithms in Beskos et al. (2006) and Bladt and 
S0rensen (2007), provided the regularity conditions on the drift are met. 
In Section 3, we use Theorem 1 to justify a key step in a new estimation 
algorithm for discretely observed diffusions. 

3. Expected maximum likelihood (EML) algorithm. In this section, we 
are concerned with the estimation of the parameters in the SDE 

N 



(3.1) dXt = n{Xt,e)dt + dWt where fi{x, 6) := g{x) + ^aifi{x), 
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driven by the standard Brownian motion Wt- The drift 0) : M — ^ M is 
given by an arbitrary family of independent, possibly nonlinear, Lipschitz 
functions /j : M — )• M with linear growth. The task is to infer the vector 
of coefficients 9 := (ao, . . . ,aAr) G 9 C M^+i in the drift ^{-,6) from K + 1 
observed realizations, xq, . . . ,xk, of the diffusion Xt. As we shall see, the 
EML algorithm consists of solving a linear system of size (A^ + 1) x {N + 1) 
given in (3.5) and converges to the global maximum in a single step. 

Having constructed the approximation to the complete likelihood in Sec- 
tion 2, we now turn to the initial estimation problem. By the M-step of 
the EM algorithm, our task is to maximize the approximate complete like- 
lihood function 9 i-^ Eq^^lY^f^^^logcpiUrn^Um-i + Sfi{Um-i,0),d)] for any 
fixed value of the model parameter ^o- The following obvious proposition is 
crucial to all that follows. 

Proposition 2. The complete likelihood 6* H> Eq^^ [^^^-^ log(/)(C/m; 
Um-i + Sfj,{Um.^i,9),5)] is a nondegenerate quadratic form with a unique 
global maximum. 

It is clear that the complete likelihood in Proposition 2 is a nondegenerate 
quadratic form in 0, bounded above by a constant, which implies that all 
of its eigenvalues must be negative. Therefore, there exists a unique global 
maximum. 

The following simple calculation will yield the globally optimal parameter 
value 6* = (qq, . . . , a^), which exists by Proposition 2. By setting the partial 
derivative with respect to each coordinate Oj, j G {0,...,A}, of 6 in the 
function given in Proposition 2 to zero, we obtain the linear system A0~^ = b, 
where 9 = (oq, . . . , on), A = dJ2m=i ^rn, b = J2m=i ^rn and, for any m G 
{1,...,M}, 

/EQ,J/o(C/„,_l)/o(C/;n-l)] ••• EQ,J/;v(f/m-l)/o(f/™-l)] 



(3.2) A 



(3.3) 



\EQ,J/o(C/;„_l)/7v(C/„^-l)] ••• EQ,J/;v(f/,n-l)/iv(C^m-l)], 
bin ■■= (EQso [(^- - - 5(^m-l)5)/o(f/™-l)] • • • 

X Eq„ [{Um - Um.^i - g{Um^l)5)fN{Um^l)]V. 



Since there exists a unique global maximum of the approximate complete 
likelihood, the inverse A~^ must also exist and the unique optimal parameter 
value is given by 9* = {A~^b)~^ . For K + 1 observations of the process Xt, the 
globally optimal parameter value 9* is obtained in the same way. The only 
difference is that matrix (3.2) and vector (3.3) are computed using M ■ K, 
rather than M, auxiliary and observed realizations [see (3.5)]. 
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The globally optimal value 6* of the parameter vector solves the linear 
system whose coefficients are yet to be determined. Computing the expec- 
tations Eq^^^ [•] in closed form is impossible because it requires the unknown 
transition density ■k{um~i-, ■ ■ ■ ■,ui\uM,UQ-,do) oi the bridge of the diffusion 
Xt . The key idea that helps to circumvent this problem is to replace the law 
of the bridge of Xt with the law of the corresponding Brownian bridge in 
all of the coefficients of matrix (3.2) and vector (3.3). The crucial additional 
benefit of this substitution is that it removes the dependence of the coeffi- 
cients of the linear system on the parameter Oq, which implies that the EM 
procedure terminates after only one iteration. Therefore, by Proposition 2, 
the EML algorithm is guaranteed to converge to the globally optimal pa- 
rameter value 6* in a single step. A recent parameter estimation algorithm 
for general one-dimensional diffusion models given in Beskos et al. (2006), 
based on a sophisticated sampling method known as retrospective sampling^ 
also employs the EM approach. The EM algorithm is also used in Bladt and 
S0rensen (2007), where the time-reversal symmetry of ergodic diffusions is 
exploited to sample from the corresponding bridge. Unlike in the case of the 
EML algorithm, in both of those settings, an iteration of E-step and M-step 
is required in order to obtain the stationary value of the model parameter. 

We now need to consider the quality of the weak approximation of the 
law of the bridge of the diffusion Xt (i.e., a process Xt conditioned to start 
at Xq = X and finish at X/s, = y, where A is the length of the time interval 
between consecutive observations in the data) by the law of the Brownian 
bridge (i.e., a Brownian motion Wt conditioned to start at Wq = x and finish 
at VFa = y)- This question is of importance because it tells us how far the 
coefficients of the linear system given by the matrix (3.2) and the target 
vector (3.3) are from the ones used in the EML algorithm (3.5). It is intu- 
itively clear that when A goes to zero, the Brownian bridge approximation 
must improve in quality. Since the law of the diffusion bridge is absolutely 
continuous with respect to the law of the Brownian bridge, it is possible 
to bound the approximation error explicitly in terms of A and the model 
parameter 6. 

Theorem 3. Assume that functions 5, : M — )■ M, i e {0, . . . , N}, in the 
drift of SDE (3.1) satisfy the linear growth condition and are twice differen- 
tiable with bounded second derivatives, and let G : M*^ — t- M 6e a polynomially 
bounded measurable function for some integer M G N. Let Q^'^'^ denote the 
law of the bridge, starting at x and finishing at y, of the diffusion Xt that 
solves SDE (3.1) with the parameter value 9 = (oq, . . . ,aAr) G M^+^ and let 
Wt denote the standard Brownian motion. The measure Q^'^'^ is then ab- 
solutely continuous with respect to the law of the corresponding Brownian 
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bridge W^'^'^ and the Radon-Nikodym derivative is given by 



where := exp (" ^ ^ {KWs,Of + fi'{Ws,9)) ds 



(a) The following inequality holds for all x,y in the domain of Xt and all 
times < ti < • • • < t A/ < A ; 

|E^A,... [G{Xt, ,...,XtJ]- EwA,.„ [G{Wt, ,...,WtJ]\^ 

2-\ 



< 



where \\G\\2 ■= Ei^a.^.h [G(VF(j , . . . , VFf^^)^]^/^ denotes the L'^-norm of the 
random variable G{Wt^ , ■ ■ ■ , ) ■ 
(b) LetS{e) ■.= snv,^^{^Ji{z,ef + ^l'{z,e)} andl{e) :=inf,eM{^(z,0)2 + ;x'(z, 
0)} be the maximum and minimum of the integrand in , respectively. 
The following inequality then holds: 



\&^A,.,y [G{Xt, ,...,XtJ]- EwA.... [G{Wt, ,...,Wt 



<i(exp(^(5(0)-/(^)))-l)||G|b. 

The absolute continuity of the measures Q^'^'^ and W^'^'^ is weh known 
and the form of the Radon-Nikodym derivative in Theorem 3 fohows from 
Lemma 1 in Beskos et al. (2006) and expressions (A.l) and (A. 2) in Ap- 
pendix A. The inequahty in part (a) of the theorem bounds the error arising 
from the approximation of the law Q^''^'^ by the measure W^'^'^ in terms 
of the variance of the Radon-Nikodym derivative and the L^-norm of the 
integrand. Since the latter is independent of the model parameter 0, this 
inequality provides a way of bounding the error for a general integrand G in 
terms of the second moment of the Radon-Nikodym derivative. In practice, 
the second moment can always be estimated by simulation, thus yielding a 
model-specific bound on the error of the coefficients in linear system (3.5) 
used in the EML algorithm. Figure 4 contains the graphs of the densities 
of the Radon-Nikodym derivative for the nonlinear SDE in (4.4) used in 
Section 4. A cursory inspection of the scale of the domains of these densi- 
ties shows how tight the bound in part (a) of Theorem 3 really is, even for 
relatively large time steps A. 
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It is intuitively clear that the Brownian bridge approximation works well 
for short time intervals A and less well as the time step grows. This view is 
supported by the inequality in part (b) of Theorem 3, which is a consequence 
of the bound in part (a). Furthermore, (b) implies that the approximation of 
the law Q^'^'^ by W^'^'^ is a good one, even for larger time steps A, provided 
that the drift f^{-,0) does not vary much as a function of the state. This 
implies that the method of approximation proposed in the EML algorithm 
would work well in the case of the diffusion with a periodic drift used in 
Example 1 in Beskos et al. (2006), for time steps A as large as ^. Also, 
note that the norm of the random variable G{Wt-^ ,■■■■> ^tA/) Hilbert 
space L^(W'^'^'^) is finite for a polynomially bounded function G because 
the law W^'^'^ of the Brownian bridge is Gaussian with bounded variance. 
The proof of Theorem 3 can be found in Appendix B. 

Having replaced the law of the diffusion bridge Q^'^'^ by the law of the 
Brownian bridge W^'^'^, which is independent of the parameter 6, we are 
left with the task of calculating the expectations in the coefficients of matrix 
(3.2) and vector (3.3). A numerical integration approach would be feasible 
because we have an explicit formula for the normal density of the marginals 
of the probability measure W^'^'^. However, because of the numerous two- 
dimensional integrals in (3.3), the problem does not lend itself well to this 
approach. 

An alternative approach is to simulate the paths of the Brownian bridge 
and use Monte Carlo methods to obtain the relevant expectations. This can 
be done by using the modified Brownian bridge sampler defined in Durham 
and Gallant (2002) and Chib and Shephard (2002), given by the following 
recursive formula: 



where 5 = A/M is the length of the time interval between consecutive aux- 
iliary states, no = x,um = y are the initial and final points of the Brow- 
nian bridge and Z^, ~ -^(0, 1) are independent random variables for all 
m G {1, . . . ,M — 1}. It is proved in Stramer and Yan (2007b) (see Propo- 
sition 1) that the joint density of the modified Brownian bridge equals the 
joint law of the Brownian bridge at the discretization times, which implies 
that scheme (3.4) introduces no discretization bias and is therefore prefer- 
able to the Euler approximation. Since the parameter 9 does not appear 
in the evolution equation (3.4) of the modified Brownian bridge, the EML 
algorithm can be described as follows. 

Let xq, . . . , xk be the K + \ observations at times t^, . . . oi the diffusion 
Xt given by SDE (3.1) and let A = tfc -ifc-i for all /c G {1, . . . , if}. Let M - 1 
be the number of the auxiliary state variables Umfc; £ {1,...,M — 1}, 



(3.4) 
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between the observed data points Xk-i and Xk such that uq/o = Xk~i and 
UMk = for all k G {1, . . . ,K}. Let S be the number of the simulations 
used. The EML algorithm then consists of the following simple steps. 

Step 1. For each k £ {1, . . . , K} and each s € {1, . . . , S}, generate a Brow- 
nian bridge path ('u^],)m=o,...,A/ using (3.4). 

Step 2. Find the unique solution of the linear system A9~^ = b, where 



(3.5) 



.KM S 

- ^ E E E f^(-tU)M-tU) and 

k=l m=l s=l 
K M S 

- E E E(-!:l - - 9{u^lik))h{n^lik) 

k=l m=l s=l 



with i, j € {0, . . . , A^}, to obtain the globally optimal parameter value 6* = 
(ag, . . . , a^). 

An appealing feature of the EML algorithm described above is that it cir- 
cumvents the iterative process that is ubiquitous in the general EM frame- 
work. The invertibility of matrix A is, by Proposition 2, equivalent to the 
nondegeneracy of the complete likelihood function, which is implied by the 
linear independence of the functions fi in the drift (3.1). Note that if auxil- 
iary state variables uik, ■ ■ ■ , UM-ik, for all /c G {1, ... , K}, are not introduced, 
then we can remove the expectation operators in (3.2) and (3.3) or, equiv- 
alently, the sums over s and m in step 2 of the above algorithm. In this 
case, the EML algorithm reduces to the classic linear regression. Under the 
conjugate normal prior, the estimates of the drift parameters then coincide 
with the posterior mean in a Bayesian analysis of the coefficients. 

We conclude this section with a brief comment about diffusion models 
with state-dependent diffusion functions. A scalar diffusion 

dXt = fiiXt, 6) dt + aiXt.-d) dWt 

can always be transformed using a change-of- variable Y = ^{X, d) = a(u,-d) • 
which depends on the diffusion parameter vector ■!?, into 

dYt = ^iY{Xue,d)dt + dWt 

(3.6) 

where /.Hl/,^,^) = ^(:pT(^;^- 2 9^(7 {v,^).^)- 

Note that if the original drift /x(-,0) is affine in the parameter 0, then so 
is the transformed drift /xy(-,^,??). Therefore, an application of the EML 
algorithm is feasible for any fixed value of the diffusion parameter {}. In 
Section 4, we will discuss how to apply the EML algorithm to diffusions of 
this kind [see the models given by (4.3) and (4.4)]. 
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4. Applications. There are at least two potential applications for the 
EML algorithm. The first is the classical parameter estimation problem for 
diffusion models. The advantage of the EML approach is that the resulting 
parameter estimates are globally optimal and the bias introduced through 
the Euler approximation is, by Theorem 1, arbitrarily small. The second 
application is based on the fact that the EML algorithm is computationally 
very fast. The speed of the algorithm enables one to easily explore the de- 
pendence of the likelihood function on the diffusion parameter 'd [see (3.6)], 
along with the dependence of the globally optimal drift parameter ^ as a 
function of i9 [see examples (4.3) and (4.4)]. Both of these applications will 
be illustrated in the present section. 

4.1. Base cases. To test the EML algorithm for potential biases arising 
in the Euler and the Brownian bridge approximations, we start by estab- 
lishing two base cases. The first case is an Ornstein-Uhlenbeck diffusion 
[see (4.1)], where the true transition density, and, therefore, the likelihood 
function, is known. The second case is a diffusion with a nonlinear drift 
[see (4.2)], where we employ the closed- form likelihood expansion from Ai't- 
Sahalia (2002) as a benchmark, because this method is known to produce 
very accurate approximations of the true transition density [see, e.g., Schnei- 
der (2006), Hurn, Jeisman and Lindsay (2007), Jensen and Poulsen (2002)]. 

To put the EML algorithm to the test, we simulate 1000 data sets from 
each of the two models [the Ornstein-Uhlenbeck process in (4.1) and the 
nonlinear diffusion in (4.2)] with K +1 = 500 observations in each data set. 
As mentioned above, in the first case, we perform an exact ML estimation 
on each of the data sets using the exact transition density in the likelihood 
function and in the second case, we first apply the closed-form likelihood 
expansion from Ai't-Sahalia (2002) to obtain an approximation to the likeli- 
hood function which is then used in quasi-ML estimation. The two models 
are given by 

(4.1) dXt = (ao - aiXt) dt + dWt, model A, 

(4.2) dXt = (ao + aiXt + a2X^) dt + dWt, model B, 

with parameter values oq = 10, ai = 2.5 for model A and = 1, oi = —1, a2 = 
— 0.5 for model B. Time t is measured in years and the consecutive obser- 
vations in the generated data sets are one month apart. In other words, 
A = ^ and we choose an auxiliary state variable for each day of the month, 
that is, M — 1 = 30. To estimate the expectations in the EML algorithm, 
we use two sets of simulations, one containing S = 1000 and one S = 200 
simulated paths. Figure 1 displays the comparison of the EML procedure 
using 1000 simulations with the direct ML estimation for model A. The bi- 
ases and standard deviations of the parameter estimates are shown in Table 
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(a) a'g (b) a} 

Fig. 1. Empirical distribution of estimates from model A. This figure shows the empir- 
ical distribution of the maximum likelihood estimator and the EML estimator over 1000 
estimations for the Ornstein-Uhlenbeck model (4-1)- Plotted are the maximum likelihood 
estimator, obtained by maximizing the true transition density using a gradient solver, and 
the EML estimator, obtained by using formula (3.5). 

Table 1 

Base cases A and B; bias and standard deviation. This table shows the estimation results 
for the parameters from equations (4-1) md (4.2) on 1000 data sets generated using the 

true transition density for model A and a very fine Euler approximation for model B 
(100 auxiliary data points). The benchmark estimations for model A are performed using 
exact maximum likelihood. The benchmark estimations for model B are obtained using 

closed-form likelihood expansions. The column "bias" shows the mean bias of the 
estimated parameters. Bias is defined as e^''> - 00 for the ith estimation. The column 
"Std. Dev. " shows the standard deviation of the parameter estimates. For the EML 
estimation, 30 auxiliary data points were used 







00 


ML/Ait-Sahalia 


EML S 


= 1000 


EML S 


= 200 


Bias 


Std. Dev. 


Bias 


std. Dev. 


Bias 


Std. Dev. 


Model A 


ao 


10 


0.3902 


1.5106 


0.2760 


1.4655 


0.2765 


1.466 




ai 


2.5 


0.0964 


0.3761 


0.0678 


0.3648 


0.0680 


0.3650 


Model B 


ao 


1 


0.1091 


0.2769 


0.0832 


0.3669 


0.0832 


0.3669 






-1 


-0.2396 


0.5124 


-0.2352 


0.5465 


-0.2353 


0.5467 




a2 


-0.5 


0.0825 


0.3525 


0.1116 


0.3247 


0.1116 


0.3246 



1 . Explicit gradients are used in the likelihood search in the case of both the 
exact likelihood function and the closed-form likelihood expansion. 

A striking observation is the higher bias of the ML estimator over the EML 
estimator. The closed-form likelihood expansion and EML display similar 
biases. No noticeable difference can be seen by choosing 200 or 1000 simu- 
lations to approximate the expectations in the EML algorithm. In Stramer 
and Yan (2007a), the authors suggest that in a related problem of Monte 
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Carlo estimation for the transition densities of diffusions, the optimal num- 
ber of simulations S is of the order M^, which, in the two cases discussed 
here, amounts to approximately 900 simulations. Also, note that the EML 
procedure takes about a second to produce the optimal parameter values 
for each of the data sets described in this subsection. The hardware used to 
perform these experiments was a PC with a 64-bit Xeon 2.8 MHz processor. 

4.2. Exploring the likelihood function. The empirical features of the dy- 
namics of equity indices such as the S&P 100 include time- varying volatility, 
a level effect for the volatility of the variance of return [see Jones (2003a)] 
and evidence for jumps [Andersen, Benzoni and Lund (2002)]. We are now 
going to investigate how a diffusion model, specified by a nonlinear SDE, fits 
the implied volatility data. In this section, we study the relation between 
the diffusion and drift parameters for each of the processes 

(4.3) dVt = K{-f-Vt)dt + a^tdWt, model I, 

(4.4) dVt = (ao + aiVt + a-iV^) dt + ^ctiT4 + asV/ dW* , model II, 

conditional on S&P 100 implied volatility data given by a time series of the 
volatility index VXO. Empirical studies in Jones (2003a) and Ai't-Sahalia 
and Kimmel (2007) have rejected the square root process (4.3) as a spec- 
ification for the variance dynamics of S&P 100. Nevertheless, the relation 
between the parameters of the square root process, conditional upon real 
data, can be investigated^ using the EML algorithm. The second model is 
a nonlinear diffusion (4.4), introduced by Ai't-Sahalia (1996), and is poten- 
tially flexible enough to accommodate the rich dynamics exhibited by the 
S&P 100 implied volatility index data. We start by transforming the SDEs 
in (4.3) and (4.4) into a form with a unit diffusion coefficient using formula 
(3.6). For model I, we apply the transformation y{x) = 2,/x/a, which yields 

(4.5) dYt = (bo^ + biYt] dt + dWt. 



Model II requires the change-of- variable y{x) = ^ ^°g(^/^°^^t^°^l +x(7 2) ^ ^Yiich 
transforms it into 

dYt = fiiYt) dt + dWt 



program written in C++, which does not depend on any numerical h- 
braries, that implements the EML algorithm in the case of the square root 
process, together with the VXO data used in this example, can be found at 
http://www.ma.ic.ac.uk/~amijatov/Abstracts/eml.html. 
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with 

f^{y) = bo I + hi " — 

+ 02 , = / =■ 

4^g-2y^(g2y^ _ cj2)2/a2a2 2^e~^yV^{e'^yV^ - G\fla2 

In the case of model I, we also perform maximum likelihood estimation with 
the true transition density of the square root variance (4.5).2 The resulting 
parameters are Q = 0.0462866, k = 5.96063 and a = 0.455324. Note that this 
parameterization ensures a stationary marginal distribution for the square 
root variance process. 

For any fixed value of volatility a, we can transform the time series for the 
volatility index using the change of coordinates y{x) = 2^/x/a and perform 
the estimation of the parameters in (4.5) using EML. This operation takes 
about one second on a personal computer (64-bit Xeon 2.8 MHz, running 
Linux) for the given data set. By repeating this process for each value of a on 
a finite grid in the interval [0.1, 1.2], we can compute the functions plotted in 
Figure 2(b). Using these functions, it is possible to regard the model in (4.3) 




10.20.30.40.50.60.70.80.9 1 LI 



(a) Maximum likelihood function / of the square 
root process as a function of (7. 



(b) 0* and fc* as functions of a. 



Fig. 2. Model I. (a) displays the maximum likelihood function (up to a proportionality 
factor) of specification (4-3) as a function of a on the x-axis, conditional on S&P 100 
implied volatility data. For a given a , the optimal values of 6* and k* are computed using 
EML. The likelihood function is computed using the SML algorithm with the Brownian 
bridge importance sampler [see Durham and Gallant (2002)]. (b) displays globally optimal 
values 9* and a* as functions of a. 



^The noncentral chi-square density is given in terms of special functions that are difficult 
to handle numerically for nonsymbolic computational tools with finite precision such as 
Fortran, C/CH — h and MATLAB. We perform this estimation using Mathematica. 
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as having a single parameter a and apply the SML algorithm [as described 
in Durham and Gallant (2002)] or some analytic likelihood approximation 
to find the likelihood function for a [see Figure 2(a)]. This application of 
EML has therefore reduced the dimension of the parameter space from 3 
to 1. Note that, in this approach, SML affords an additional computational 
advantage over other likelihood approximation methods because one can 
reuse the modified Brownian bridge paths that were generated for EML as 
an importance sampler when computing the likelihood function. Using the 
time series of implied volatilities, an analogous estimation can be performed 
for the model given by SDE (4.4). 

The analysis of the likelihood functions, together with the optimal drift 
parameters for the processes (4.3) and (4.4), provides some interesting in- 
sights into the process specification as well as the estimation method. The 
first observation is that the ML estimates obtained from the true transition 
density of the square root process (4.3) agree with the EML estimates. This 
is clear from Figures 2(a) and 2(b) at the point a = 0.455324. A visual check 
also reveals that even though EML is a numerical procedure, the implied 
globally optimal drift parameters'^ and the likelihood function are smooth in 
the parameters of the state-dependent volatility function. Preliminary EML 
estimates also suggest that the Euler approximation on a daily level is not 
sufficient for a nonlinear diffusion like the one in (4.4). Even a discretiza- 
tion of 5 subintervals per day appears too coarse. The estimates stabilize 
between 10 and 30 subintervals. This is in line with the findings in Roberts 
and Stramer (2001), Figure 4, in a similar setting. The likelihood function 
for model (4.4) is extremely flat in the diffusion parameters close to the 
optimal region (see Figure 3) and care should be taken with the estimation. 
Finally, a likelihood-ratio test applied to the two variance models reveals 
that the specification (4.4) is preferable to the square root specification. 

5. Conclusion. This paper is concerned with an approximation proce- 
dure for the likelihood function of a nonlinear diffusion, given a discrete set 
of observations. The method can be used with any algorithm for sampling 
from the law of the diffusion bridge [e.g., Beskos et al. (2006) or Bladt and 
S0rensen (2007)] and is shown to converge uniformly on compact subsets of 
the parameter space (see Theorem 1). 

We also develop a new expected maximum likelihood (EML) algorithm 
for the estimation of parameters governing a nonlinear diffusion process. 
It provides globally optimal parameter values when the drift is affine in the 



■^It is beneficial for the stability of the method to keep the random numbers fixed in 
all of the expectations arising in (3.2) and (3.3). This principle is shown to guarantee the 
convergence of the MCEM algorithm in Papaspiliopoulos and Sermaidis (2007). 
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Fig. 3. Model II. The likelihood of model (4-4) a funetion of ai (on the x-axis) and 
(J2 (on the y-axis), conditional on S&P 100 implied volatility data. For given ai and (72, 
the values aJjOi and are computed using EML. The likelihood function for model II 
given in (4-4) is computed using the SML algorithm with the Brownian bridge importance 
sampler [see Durham and Gallant (2002)]. 



coefficients and the diffusion function is constant. For diffusions with a state- 
dependent volatihty function, our method is used to express the hkeUhood 
as a function of the volatihty parameters only, thereby significantly reduc- 
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Fig. 4. Simulated densities of the Radon-Nikodym derivative dQ^'^'^/dW^'^'" of the 
law of the bridge of the diffusion Xt with respect to the law of the Brownian bridge (see 
Theorem 3 for the precise definition). The diffusion Xt used in this example is given by a 
nonlinear SDE (4-4)- The parameter value 6 is implied by the VXO data. The coordinates 
of 9 are approximately given by ao = 0.1, ai = — 1, a2 = —10 and the diffusion coefficients 
are ai =0.001 and G2 = 3. The graphs correspond to time-horizons A equal to two weeks 
and one month, and a fixed starting point x — 0.04 for the diffusion bridge and the Brow- 
nian bridge. Three different end points y of the respective bridges are chosen for each 
time-horizon. Note that the density in all of the cases is concentrated in a small neighbor- 
hood of 1, thus making the bound in (a) of Theorem 3 very tight, even for time intervals 
A as long as one month. 
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ing the dimension of the parameter space for a gradient-based solver. The 
framework is easy to implement and is guaranteed to solve the expectation 
maximization problem in a single iteration. It uses auxiliary data points 
and is based on two observations: the fact that the complete likelihood (i.e., 
the joint likelihood of the observed and auxiliary data points) of the Euler 
scheme approximates uniformly the complete likelihood of the diffusion as 
the time interval between the consecutive auxiliary data points goes to zero, 
and the fact that the law of the Brownian bridge approximates well the 
law of the diffusion bridge. Global optimality (Proposition 2), theoretical 
bounds (Theorem 3) and asymptotic results (Theorem 1) are established, 
quantifying the quality of the approximations. Additional numerical exper- 
iments suggest that the method works very well for multivariate nonlinear 
diffusions, even for large time intervals between observed data points. 

A topic for further research is the possible extension of the EML frame- 
work to the estimation of jump-diffusions. Instead of using the law of the 
Brownian bridge, a semi-nonparametric density [see Gallant and Tauchen 
(2009, 2006)] could be used to approximate the conditional density p{xrj^ \ 
Xt2,Xto) =v{xr2 I Xr^)p{xr^ \ Xro)/p{xr2 \ Xro),0 < tq < Ti < T2, of the Cor- 
responding bridge process with jumps. The complete likelihood function 
would then be obtained by conditioning on a high-frequency discretization 
(i.e., using many auxiliary state variables) of the jump-diffusion which would 
identify well the jump parameters. 

APPENDIX A: PROOF OF THEOREM 1 

Recall that we have a diffusion Xt which is a solution of the SDE dXt = 
fi{Xt, 6) dt + dWt, where //(•, 0) : M — >■ M is a bounded Lipschitz function with 
bounded first and second derivatives. Without loss of generality, we can 
assume that the parameter space O is a compact subset of M^"*"-^. For s >t, 
let 'Kt,s{x I xq,9) denote the probability density function of Xs conditional 
on Xt = xq. It is well known that such a density exists [see (7) and (8) in 
Rogers (1985)] and, by Girsanov's theorem, can be expressed as 

(A.l) vr,,+5(x I xo,^) = -i=e^(---»)'/(25)e^^>(^''')''"<I>,(5,xo,rr), 

v27ro 

where 



(A.2) ^e{5,xo,x):=¥. 



exp^-^y g0{xQ + u{x - xq) + y/5W^) du 



Here, denotes the Brownian bridge := Wu — uWi, for u G [0, 1], and 
the function gg is given by ge{u) := fi'{u,6) + jj.{u,6)'^. Our task is to prove 
that the sum 

(A.3) ^ E^,^, ^og [^^Xs^'^^^y^Xs, + 6,{Xs,,e),5)) 
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converges to zero uniformly for all 9 in the compact parameter space 0, 
where Qgp is the law of the bridge of the diffusion Xt, for t G [0, A], which 
starts at Xq = xq and finishes at = x/\ for any fixed pair of real numbers 
xo,XA G The function (j){y;x,6) in this expression is the normal density 
function with mean x and variance 6. 

The first step in the proof of Theorem 1 is to understand the integrand 



(A.4) 



+ / fi{u,e)du + log{^g{6,x,y)). 



We start with the following claim. 



Claim 1. The integral of the drift can be expressed as fi{u,9) du = 
{y — x)iJ,{x,9) + ^{y — x)'^IJ,'{x,9) + {y — x)'^hi{x,y), where hi{x,y) is a bounded 
measurable function which is linear in 9. 

Let iJ,{u, 9) = ii{x,9) + {u — x)fi'{x, 9) + ^{u- x)'^n"{£,x,u, 9) be the Taylor 
approximation of order 2 of the drift ^{-,9). The point ^x,u lies in the interval 
{x,u) C M [or in (u, z) , if x is larger than u] . For any fixed y in M, we can inte- 
grate this representation of n{-,9) to obtain the representation of the integral 
which is given in Claim 1. We need to check that the function h\{x,y) := 
l^y\yi ~ x)'^b^ {u,x) du, for x j^y, is bounded and measurable. Here, 

b^{u,x) is given by the quotient b^{u,x) := 2 ^^"'^^ ^^^(f^J)" x)fj, {x,e) ^£ ^ _^ 

and is zero otherwise. Note that the function b^ is bounded since fi"[-,9) is 
bounded and linear in 9. Since the set @ is compact, the estimate \hf{x, y)\ < 
(y-x)^ Ix(^ ~ x)"^ '^'^ — § holds for all ?/ > X and some constant C. A sim- 
ilar bound holds for y < x. It follows from the definition of b^ that it is 
measurable on M x M since it is continuous outside the zero measure set 
{{x,x):x G M}. Fubini's theorem implies that hf{x,y) must therefore also 
be measurable. This proves Claim 1. 

The next task is to relate the asymptotic behavior of the function log($6i((5, 
x,y)) to the drift ^i{-,9). This will be achieved in Claims 2 and 3. 

Claim 2. There exist constants 5o > and Aq > such that for all 
6 G [0,So], we have the following equality: 

log{^g{6, X, y)) + - geix + u{y - x)) du = 6^/'^A'^{x, y), 
^ Jo 

where A^{x,y) is a measurable function of x and y that satisfies \A^{x,y)\ < 
Aq for all x,y gM. and all 9 gQ. 
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By Lagrange's mean value theorem, we obtain the following equalities: 

ge{x + u{y - x) + VdW^) - ge{x + u{y - x)) 

= V5W%{x + u{y-x) + i) 
= ^5W^X'{x,y,u), 

where the real number ^ G (0, ^fdW^) depends on x^y^ 5 and W°. Note that 
the random variable {x,y,u) is measurable since it coincides with the 
quotient [ge{x + u{y — x) + V6W^) — g0{x + u{y — x))]/ {^f5W^) on a com- 
plement of the set {W^ = 0}, which has probability zero. This representation 
also implies that for any fixed path of W^, the function X^{x, y,u) is jointly 
measurable in the variables x, y and u and that it is quadratic in the parame- 
ter 9. The assumptions on the drift n{-,9) and the above equality imply that 
the random variable X^ {x,y,u) is bounded for all possible triplets {x,y,u). 
Let the variable Y^{x, y) denote the integral Y^{x, y) := Jq X^{x, y, u)Wj^ du 

^00 I \k-l 



2 > X 



and let Z^{x,y) denote the random variable Z^{x,y) := ^ J2T=i(~ 

— ^^j ^ . The left-hand side of the equality in Claim 2 can now be rewritten 
as 



(A.5) logE 



§3/2 

exp( -—Y\x,y) 



logE{l- 6^/^ Z\x,y)]. 



From the definition of and the fact that X^{x, y, u) is bounded by a posi- 
tive constant C independent of 0, we can see that |y^(x,y)| <C{^^\Wu\du+ 

\Wi\). This implies the bound \Z%x,y)\ < el^'(^'2^)l < e'^/o I^^M^e'^l'^il, which, 
combined with the Cauchy-Schwarz inequality, yields 

(A.6) E[|Z^(x,y)|]<E[e2^/olH^.M-]i/2jE[g2C|m|]i/2, 

This means that the expectation of Z^{x,y) exists and is bounded above 
and below uniformly in x and y for all parameters 6 G Q. Therefore, there 
exists (5o > such that — ^ < SK[Z^ {x,y)] < ^ holds for all x,y, if (5 G [0,5o]- 
The right-hand side of (A.5) can now be expressed as 

logE[l - 5'/^Z'ix, y)] = log(l - 5'/^E[Z%x, y)]) 

^,_,E[Z'{x,y)] 



oo 



k=l 



since the series log(l — z) = X^^Li ■^''/^ converges uniformly on compact 

subsets of (-1,1). We can define A^{x,y) := Er=i('^^^^)''"^IE[Z^(x, y)] V^, 
which is therefore measurable and uniformly bounded for all x, y and 6 G 
[0,5o]- If ^-Iso follows from the bound in (A.6) that there exists a constant 
Ao> such that \A^{x,y)\ < Aq for all x,y and all parameter values 6 G 0. 
This proves Claim 2. 
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Claim 3. The equality gg{x + u{y — x)) du = g0{x) + {y — x)h2{x,y) 
holds for all x and y and some bounded measurable function h2{x,y) which 
is quadratic in the parameter 0. 

By Lagrange's mean value theorem, we obtain gQ{x + u{y — x)) = ge{x) + 
5e»(0^(y ~ where |^ — x| < u\y — x\. It is clear that [ge{x + u{y — x)) — 
9e{x)\/{y — x) is a measurable function defined on ((R x R) — {{x,x) :x £ 
R}) X [0,1]. Since the diagonal {{x,x):x G R} has Lebesgue measure zero 
in R X R, we can extend the quotient to a bounded measurable function on 
R X R X [0, 1]. If we integrate the above equality over the interval [0, 1], we 
obtain gQ{x + u{y — x)) du = ge{x) + {y — x) Jq gg{^)udu. It follows from 
Fubini's theorem that the last integral, denoted by h2{x,y), is a measurable 
function of x and y. It is also clear that h2{x,y) is bounded, since the 
integrand {x,y,u) gg{^)u is bounded on its domain, and that h2{x,y) is 
quadratic in the parameter 6. This completes the proof of Claim 3. 

We can now apply Claims 1, 2 and 3 to the equality in (A. 4) to obtain 
the following representation: 

f n,t+5{y I x,e) 

\(p{y;x + 5fi{x,e),6) 
= 0) [{y - x)^ -6] + {y- xfh{{x, y) 

-\5hl{x,y){y-x) + 5^'^A\x,y), 

where h^{x,y), h2{x,y) and A^{x,y) are bounded measurable functions. The 
sum in (A. 3) can now be decomposed naturally into four sums. The first 
three will tend to zero as 6 goes to zero, by Lemma 4. The fourth one can 
easily be bounded as follows: 



53/2 



A/ 5-1 



k=l 



= V6iA-6)Ao 



where the constant ^0 is as in Claim 2 and hence converges to zero uniformly 
in 9. Also, note that since the functions fi'{x,6), h\{x,y) and h2{x,y) are at 
most quadratic in the parameter 9 which takes values in a compact region 
0, it is enough to state and prove Lemma 4 for functions that do not depend 
on 9 and still obtain uniform convergence in 9. The proof of Theorem 1 will 
therefore be complete as soon as we prove the following lemma. 

Lemma 4. Let : R x R — )• R be one of the following functions: 
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(1) fs{x,y) = h{x){{y — x)^ — 5) for any hounded dijferentiable function h 
with bounded first derivative; 

(2) fs{x,y) = h{x,y){y — x)^ , where h{x,y) is a bounded measurable func- 
tion; or 

(3) fs{x,y) = 5h{x,y){y — x), where h{x,y) is as in (2). 
The following then holds: 

A/5-1 

lim ^ EQgJ/5(Xfe5,X(fc+i)5)] =0. 

^ k=l 

Proof. Let A{5) denote the sum in the above hmit. Since the diffusion 
Xt is a Markov process, we can express A[6) as 

A/5-1 

(A.7) A{5)= EQ,jEQ,J/5(Xfc5,X(,+i)5)|Xfc5,XA]]. 

k=l 

Conditional densities, which arise in the expectations in (A.7), can be ex- 
pressed using the formula in (A.l) for the transition density of Xt- We can 
therefore rewrite A{5) as 

7ro,fc5(2;|xo)vrfc5,A(2;A|^ 



k=i -^^ 



f , , .'^kS,(k+l)5{y\x)TT{k+l)5,Ai^^\y) , 
X / f5ix,y) J 7—^ dy. 

To simplify the notation, let := ^ — 1. Note that we are always choosing 
6 so that A^ is an integer. The above expression for A{6) implies that the 
lemma will follow if we prove the following equalities: 

7V-1 ^ 
lim > 



{xA - yf 



fs{x,y) exp 



2A{{N-k)/{N + l)) 
(y-x)'^ (x-xo)^ 



(A. 



2A/(Ar + l) 2A(A:/(Ar + 1)) 
x^(A^,,,xa)cI>(^,x,, 
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lim / fs(x,XA) — , : exp 



(XA — x)'^ {x — Xq^'^ ' 



s^oj^"- ' JbiA-S) \ 26 2{A-6)J 
(A.9) 

X ^{6,x,xa)^{A - 6,xo,x)dx = 0. 

Condition (A.9) corresponds to the last summand in (A. 7), while condition 
(A. 8) accounts for the rest of the sum in (A. 7). 

Let us start by proving (A.9). Note that since, by assumption, the drift /i is 
bounded and has a bounded first derivative, the function $ must be bounded 
(as mentioned above, we are omitting dependence on the parameter Oq as it 
is fixed throughout). The function h in the definition of fs is also bounded, 
so the absolute value of the integral in (A.9) is smaller than 6C |f ^ — 
lle-'^y^dv, (53/2c7/jg|7;3|e-^V2(i^ and J^\v\e-''^ /"^ dv for fs given by 

(1), (2) and (3), respectively. These bounds are obtained by the change of 
variable v = {xa — x)/y/d in the integral in (A.9). In each of the three cases, 
the constant C is independent of S. This proves (A.9). 

We are now left with the harder problem of proving (A. 8). Note that the 
integrand is absolutely integrable over M x M, which, by Fubini's theorem, 
implies that we are free to choose any order of integration in the double 
integral. We will first prove case (1), where fs{x,y) = h{x){{y — x)"^ — 6). 
This case is harder than (2) and (3), which will be dealt with at the end of 
the proof. 

By substituting u = {x — y)/ and integrating over the state variable x 
first, we can rewrite the sum in (A. 8) as 

(A.IO) B{u,N):= [ {u"^ -l)e-''^^^C{u,N)du, 



(A.ll) C{n,N):=Y^ 
F{u,k,N) :-- 



F{u,k,N) 




X exp 



(A.12) 



{u^A/{N + l) + y-xo) 
2A{k/{N + l)) 

{xA-yf 

2A{iN-k)/{N + l)) 



2 



iv + 1'^' / \ iv + 1 V iv + 1 
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We shall now prove the following statements, which will imply the lemma: 

(a) lim^^^Ef=Y^7^=^; 

(b) liniTv-i^oo C{u, N) exists for every u G M; 

(c) limAr_j.oo ^(n, iV) is independent of u; 

(d) lim^v^oo B{u, N) = /jg(n2 - l)e-"'/2(lim^^oo C(n, N)) du. 

Note that (c) and (d) together imply that limAr_j.oo B{u^ N) = (limAr_j.oo C(ti, 
N)) - l)e-"'/2 ^ 0, which proves the lemma in the case where fs 

is given by (1). In order to prove (a), we need to consider the graph of 
the mapping x i— )• 1/ ^/x{N — x) in the interval (0, N). We can, without loss 
of generality, assume that N is an even integer by choosing the decreasing 
time interval 6 correspondingly [recall that 5 = A/{N + 1)]. By inspecting 
the area under the graph of this map, it is easy to see that the following 
inequalities must hold: 

dx 2 f^-' dx 



^/x{N -x) N y/x{N - x) 

N-l 

/U( AT 



N-l 

^ ^k{N-k) 



k= 



dx /-^ dx 



\/x{N - x) Jn/2+1 \/x{N - x) 



It is easy to check that ^ ^ = 2arctan(Y'^^) for all x S (0,y), where 

y is any positive real number. Using this formula to calculate the integrals 
in the above inequalities, we obtain the following relations: 

2(arctan(ViV - 1) - arctan(l/VA^ - 1) + 2/iV) 



N 

< 




^ ^k{N-k) 
( 

< 21 arctan(l) + - - 

for every even integer N . It is now clear that the limit of the sum in (a) 
equals vr. 

We are now going to prove (b) and (c). Note that we can assume, without 
loss of generality, that the function h in (A. 12) is nonnegative [i.e., h{y) > 
for all y G M] because we can express it as /i = — /i_ , where = 
max{/i(y),0} and h^{y) = max{ — 0} are nonnegative functions. Note 
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that the functions /i+ and /i_ are nondifferentiable on a set which is at most 
countable^ and therefore has Lebesgue measure zero. 

We will start by showing that the limit limjv-5>oo C'CO, -/V) exists. Since h 
is nonnegative, the same is true of F(0,k,N). Using the same reasoning as 
above, we obtain the following inequalities: 



F{0,k,N) I 

k=l •'^ 



+ Fi^O,-,Nj-+ Y F{0,k + 1,N) 



'■^+1 dx 



k=N/2 



(A.13) 



< Y FiO,k + l,N) [ 

^ r^+l dr 

1 ^^/o I 1 



dx 



k=N/2+l 



y/x{N - x) 



Since the function x i— )• l/y^x{N — x) is integrable on the interval [0, A^] and 
F is bounded, the inequalities in (A.13), together with the next claim, imply 
that the limit li'nij\f^ooC{0,N) exists. 

Claim. 

lim 5] |F(0,A; + l,Ar)-F(0,fc,Ar)| / , ^ 
f^i Jk ^x{N-x) 

lim y \F{0,k + l,N)-F{0,k,N)\ / 



N- 

k=N/2+l 



^The set A, where h+ is not differentiable, can be expressed in terms of h as follows: A — 
h~^(0) n (/i'~^(0,co) U cxD, 0)). This is because h+ is nondifferentiable precisely at 
those zeros of h where the derivative h' is nonzero. Since h' is continuous, the set h'~^{0, oo) 
[resp., 00,0)] is open in R and can therefore be expressed as a disjoint union of 

open intervals. Recall that there can only be countably many disjoint open intervals in R. 
Since function h is monotonic on each of these intervals, there can be at most one zero in 
each interval. Therefore, the set A of such zeros of h must be countable. 
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The two cases in the claim are very similar and can be deduced using 
the same methodology. We will give details only for the first case. Let k G 
{1, . . . , Y ~ 1} and recall that $ and h are bounded functions with bounded 
derivatives. We can therefore obtain, using (A. 12), the following estimate: 



(A.14) 



|F(0, k + l,N)- F(0, k,N)\<Y, Cih{k, N), 



4 = 1 



where Cj, i = 1, . . . ,4, are constants independent of k and N, and Ii{k,N), 
i = 1, . . . , 4, are integrals given by: 



h{k,N) :-- 



exp 



exp 



exp 



exp 



2A{{N-k-l)/{N + l)) 

( {xA - y? 

\ 2A{{N-k)/{N + l)) 
'2A((A: + l)/(iV + l)) 

{y-XQ^ 



dy; 



X exp 



$( A 



2A(A:/(iV + l)) 

(xA-y)^ 
'2A{{N-k)/{N + l)) 



N-k-1 



V N + 1 



X exp 



-,y,XA 



{xA - yf 



A 



dy, 



N-k 



2A{{N-k)/{N + l)) 



V N + 1 

dy, 



,y,XA 



hik,N) :-- 



X exp 



N + V 

(- 



$ A 



k 



N + 1 



<xo,y 



(xA - yf 



dy. 



V 2A{{N-k)/{N + l)) 

We can now estimate the integrals Ii{k, N), i = 1, . . . , 4, for any k G {1, . . . , 
1}. For Ii{k,N), we get 

{xA-yf 



N 



h{k,N) < / exp 



1 — exp 



A{N/{N + 1)) 
{xA - y? 



N + 1 



2A {N-k-l){N-k) 



dy 



< I exp(-(xA-y)VA) 
Jm. 
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'-^"pv — A iv(iv/2 -I))) '^y- 

The last integral in this inequality converges to zero, by the dominated 
convergence theorem. Therefore, Ii{k,N) goes to zero with increasing N 
uniformly in k. 

In order to bound l2{k, N), note that the function x i— )• exp{ — {y — xq)'^ / {2Ax)) 
has a derivative which is bounded (for each ?/ G M) on the interval x € [0, 1/2] . 
Since k G {1,. . . ,N/2 — 1}, the value x = k/{N + 1) lies in the interval 
(0,1/2). For k such that k/{N + 1) < {N + 1)^^/'^, we get the following 
bound: 



/2(A:,iV)< / fexp 



{y - xq) 



2 



2A2(iV + 1)-V4 



_ V27rA(l + ^/2) 
(Af + 1)1/8 ' 

which is uniform in all k that satisfy the above condition. 

For k .,N/2 - 1} such that k/{N + 1) > {N + 1)"^/^, the La- 

grange mean value theorem implies the existence of some ^ in the interval 
(Tvti'fer) such that 

(y-xo)^ \ ( {y-xof 



exp — — — — — — — — exp 



2A(A:/(iV + 1) + l/(Af + 1)) ; 2A(A;/(iV + 1)) 

< 1 (l/ - ^o)% -fa-xo)V(2A£) 



< 



iV + 1 2A^2 

1 {y-xofVN+l 



N + l 2 A 

The last inequality is independent of k and holds for every y G M. It therefore 
implies that 

/2(A:,iV)< ^ 



for ah /c G {1, . . . , f - 1} which satisfy the inequality > (iV + 1)"^/^ (the 
constant C in the above expression is independent of k). This, together with 
the inequality in (A.15), implies that l2{k,N) converges to zero uniformly 
in k. 

It is shown in Rogers (1985) (see page 160) that the partial derivative 
^ of the function ^{S,xo,x) defined in (A. 2) equals the expectation of the 
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derivative 



d_ 
dS 



exp 



Is 

2 



g{xo + u{x - xo) + V6W^) du 



Using the fact that g is bounded and has a bounded first derivative, we 
conclude that ^ is also bounded. Therefore, we can apply Lagrange's mean 
value theorem to prove that Iz{k, N) and /4(/c, A^) converge to zero uniformly 
in A: as goes to infinity. 

We have just shown that limAr^oo M(iV) = 0, where M(A^) := max{|F(0, A; + 
l,N) - F(0,fc,A^)| :A; = l,...,f - 1}. The inequalities 



N/2~l 

0< Yl \F{Q,k + l,N)-F{Q,k,N)\ 

k=l 



k+1 



dx 



k \/x{N - x) 



< M{N) 



N/2 



dx 



1 v^x(iV-x)' 



together with (a) and the fact that M{N) converges to zero, prove the claim. 
We have therefore proven that the limit limAr_>oo C(0, A^) exists. 

In order to complete the proof of (b) and (c), we need to understand the 
behavior of 



for all u S M. The key observation here is that every instance of the parameter 



u in definition (A. 12) of the function F(u,k,N) is of the form 



therefore natural to expect that the partial derivative ^{u,k,N) tends to 
zero with increasing A'^ for every fixed value of u. This is precisely what we 
shall now prove. 

It follows from (A. 12) that the inequality 



. It is 



(A.16) 



dF 
du 



{u,k,N) 



< 



1=1 



holds, where Dq is a positive constant independent of u. A; and A^, and the 
integrals Jj, f = 1, . . . , 4, are of the following form: 



J2:-- 



$0 A- 



k 



N + 1 



,Xo,U 



A 



N + 1 



+ y 



exp 



2A 



dy, 



h' u 



A 

FTi 



+ y 



exp 



(xA -yf 
2A 



dy, 
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J: 



3 • = 



$1 



A 



iv + r 



A 



iV + 1 



+ y,y 



exp 



2A 



dy; 



tVA/(jV + l)+j/-xo| 
A(A:/(iV + l)) 



exp 



^^A/(iV+l)+y- 



2A(A:/(iV + l)) 



dy. 



The functions $i and $2 denote the derivatives of the function <I> given 
in (A. 2) with respect to the first and second state variable, respectively. It 
is shown in Rogers (1985) that these derivatives exist and that they are 
bounded. 

In order to obtain the bound in (A. 16), we had to exchange the order of 
differentiation and integration [see the definition of function F in (A. 12)]. 
This can be justified by the dominated convergence theorem since the dif- 
ference quotients are bounded above by the function 



y^ 



sup 

M'e(u-i,«+i) 



du 



u',y) 



exp 



(xA - yf 
2A 



for each u € M, where / is the integrand in (A. 12). This function is clearly 
in L^(]R) and the dominated convergence theorem applies. Also, note that 
the integral J2 is well defined because, as noted earlier (see page 23), the 
function h is nondifferentiable only on a set of measure zero. 

We can now proceed to estimate the integrals Jj, z = 1, . . . ,4. Since the 
functions $i,<I*2 and h' are bounded, the integrals Ji, J2 and J3 are also 
bounded above by a constant for all u, N and k. By introducing a change 

of variable v = {u^J'-^j^ + y — xq) / we can transform J4 into the 

integral Jj^ \v\e~^'^ /"^ dv, which is finite and independent of n, N and k. Com- 
bining these findings with (A.16), wecan conclude that \^{u,k,N)\ < ^^^^ 
for some constant D independent of n, N and k. 

By the fundamental theorem of calculus, we have C{u,N) = C{0,N) + 
lo ^('^' ^) dv. Using the bounds on WG havG just obtOjiiiGcl, WG find that 
the following inequalitiGS hold: 



\C{u,N) 



c(o,iv)|< V 



1 



^ ^k{N - k) Jo 



OF 



dv 



< 



Du 



VnTT fT{ ^Jk{N-k)' 



Since the sum on the right-hand side is convergent and since we know that 
the limit L := lim7v-!.oo (^(0, iV) exists, it follows from this inequality that 
limAr_>oo C{u,N) = L for every u in M. This proves (b) and (c). 
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Statement (d) follows from the dominated convergence theorem if we can 
show that the function C{u,N) is bounded for n G M and all integers N. 
This is a consequence of the definition of C{u,N) given in (A. 11), the fact 
that F{u,k,N) is bounded [see (A. 12)] and statement (a) which was proven 
above. This concludes the proof of the lemma in case (1), where fs{x,y) = 
h{x){{y — xY — S), because, by (c) and (d), we get lim7v-5.oo ^('U, A^) = 
(lim7v-i>oo C{u, N)) f^iu^ — l)e~" du = 0, which is equivalent to the state- 
ment in (A. 8). 

Cases (2) and (3) are much simpler. Again, what we need to show is that 
(A. 8) holds for the corresponding choices of fs{x,y). By introducing the 
substitution u = , where 6 = j^-^- , the absolute value of the integral in 

(A. 8) of case (2) [resp. (3)] is transformed to y'^^x f^\u\'^e~^''^'^C{u,N)du 

[resp., ^J'^^ f^\u\e~'^'' ^'^C{u, N) du], where the function C{u,N) is as in 
(A. 11), while the function F{u,k,N) is as in (A. 12) with the exception of 
the integrand h{u<J'-^j' + y,y), which now becomes a bounded function of 
two variables. It is clear that the equality in (A. 8) will hold as soon as we 
see that the function C{u,N) is bounded for all n G M and all integers N. 
This follows from definition (A. 11), from statement (a) on page 22 and from 

the inequality \F{u,k, N)\ < D\ exp ( — ^'^^^^'^ ) dy , which holds for some 
constant Di . This concludes the proof of the lemma. □ 



APPENDIX B: PROOF OF THEOREM 3 

As mentioned in the discussion following Theorem 3, it follows form 
Lemma 1 in Beskos et al. (2006) and expressions (A.l) and (A. 2) in Ap- 
pendix A that the Radon-Nikodym derivative dQ^'^'^/dW^'^'^ exists and 
is of the required form. The bound in (a) follows from the Cauchy-Schwarz 
inequality on the Hilbert space L^(W'^'^'^) since the Brownian bridge is a 
Gaussian process with bounded variance and the function G has at most 
polynomial growth. In other words, the random variable . . . , VF^J^^) 

is an element in L^(W^'^''^), as is the Radon-Nikodym derivative since the 
drift n{-,6) is bounded above and below. 

Part (b) of the theorem is a consequence of the inequality in part (a), 
Jensen's inequality and the elementary observation < E[X^] — E[X]^ < 
, which holds for any random variable X that takes values in a bounded 

interval [a,b], applied to X := ^ , — 1- Here, the variable Lg is as 

defined in Theorem 3. This concludes the proof. 
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